Overview

Dataset statistics

Number of variables21
Number of observations10127
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory5.9 MiB
Average record size in memory610.2 B

Variable types

Numeric15
Categorical6

Warnings

Credit_Limit is highly correlated with Avg_Open_To_BuyHigh correlation
Avg_Open_To_Buy is highly correlated with Credit_LimitHigh correlation
CLIENTNUM has unique values Unique
Dependent_count has 904 (8.9%) zeros Zeros
Contacts_Count_12_mon has 399 (3.9%) zeros Zeros
Total_Revolving_Bal has 2470 (24.4%) zeros Zeros
Avg_Utilization_Ratio has 2470 (24.4%) zeros Zeros

Reproduction

Analysis started2021-03-04 15:59:19.923170
Analysis finished2021-03-04 16:00:40.649633
Duration1 minute and 20.73 seconds
Software versionpandas-profiling v2.11.0
Download configurationconfig.yaml

Variables

CLIENTNUM
Real number (ℝ≥0)

UNIQUE

Distinct10127
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean739177606.3
Minimum708082083
Maximum828343083
Zeros0
Zeros (%)0.0%
Memory size79.2 KiB

Quantile statistics

Minimum708082083
5-th percentile709120390.5
Q1713036770.5
median717926358
Q3773143533
95-th percentile814212033
Maximum828343083
Range120261000
Interquartile range (IQR)60106762.5

Descriptive statistics

Standard deviation36903783.45
Coefficient of variation (CV)0.04992546194
Kurtosis-0.6156397044
Mean739177606.3
Median Absolute Deviation (MAD)6347700
Skewness0.9956010103
Sum7.485651619 × 1012
Variance1.361889233 × 1015
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7800975331
 
< 0.1%
7200490831
 
< 0.1%
7173767581
 
< 0.1%
7205983081
 
< 0.1%
7199306581
 
< 0.1%
7168054081
 
< 0.1%
8199979831
 
< 0.1%
7781919331
 
< 0.1%
8241656581
 
< 0.1%
7712207581
 
< 0.1%
Other values (10117)10117
99.9%
ValueCountFrequency (%)
7080820831
< 0.1%
7080832831
< 0.1%
7080845581
< 0.1%
7080854581
< 0.1%
7080869581
< 0.1%
7080951331
< 0.1%
7080981331
< 0.1%
7080991831
< 0.1%
7081005331
< 0.1%
7081036081
< 0.1%
ValueCountFrequency (%)
8283430831
< 0.1%
8282989081
< 0.1%
8282949331
< 0.1%
8282918581
< 0.1%
8282883331
< 0.1%
8282858581
< 0.1%
8282817331
< 0.1%
8282361331
< 0.1%
8282274331
< 0.1%
8282155081
< 0.1%

Attrition_Flag
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
Existing Customer
8500 
Attrited Customer
1627 

Length

Max length17
Median length17
Mean length17
Min length17

Characters and Unicode

Total characters172159
Distinct characters16
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowExisting Customer
2nd rowExisting Customer
3rd rowExisting Customer
4th rowExisting Customer
5th rowExisting Customer
ValueCountFrequency (%)
Existing Customer8500
83.9%
Attrited Customer1627
 
16.1%
Histogram of lengths of the category
ValueCountFrequency (%)
customer10127
50.0%
existing8500
42.0%
attrited1627
 
8.0%

Most occurring characters

ValueCountFrequency (%)
t23508
13.7%
i18627
10.8%
s18627
10.8%
e11754
 
6.8%
r11754
 
6.8%
10127
 
5.9%
C10127
 
5.9%
u10127
 
5.9%
o10127
 
5.9%
m10127
 
5.9%
Other values (6)37254
21.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter141778
82.4%
Uppercase Letter20254
 
11.8%
Space Separator10127
 
5.9%

Most frequent character per category

ValueCountFrequency (%)
t23508
16.6%
i18627
13.1%
s18627
13.1%
e11754
8.3%
r11754
8.3%
u10127
7.1%
o10127
7.1%
m10127
7.1%
x8500
 
6.0%
n8500
 
6.0%
Other values (2)10127
7.1%
ValueCountFrequency (%)
C10127
50.0%
E8500
42.0%
A1627
 
8.0%
ValueCountFrequency (%)
10127
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin162032
94.1%
Common10127
 
5.9%

Most frequent character per script

ValueCountFrequency (%)
t23508
14.5%
i18627
11.5%
s18627
11.5%
e11754
7.3%
r11754
7.3%
C10127
 
6.2%
u10127
 
6.2%
o10127
 
6.2%
m10127
 
6.2%
E8500
 
5.2%
Other values (5)28754
17.7%
ValueCountFrequency (%)
10127
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII172159
100.0%

Most frequent character per block

ValueCountFrequency (%)
t23508
13.7%
i18627
10.8%
s18627
10.8%
e11754
 
6.8%
r11754
 
6.8%
10127
 
5.9%
C10127
 
5.9%
u10127
 
5.9%
o10127
 
5.9%
m10127
 
5.9%
Other values (6)37254
21.6%

Customer_Age
Real number (ℝ≥0)

Distinct45
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean46.3259603
Minimum26
Maximum73
Zeros0
Zeros (%)0.0%
Memory size79.2 KiB

Quantile statistics

Minimum26
5-th percentile33
Q141
median46
Q352
95-th percentile60
Maximum73
Range47
Interquartile range (IQR)11

Descriptive statistics

Standard deviation8.016814033
Coefficient of variation (CV)0.1730523011
Kurtosis-0.2886199153
Mean46.3259603
Median Absolute Deviation (MAD)6
Skewness-0.03360501632
Sum469143
Variance64.26930723
MonotocityNot monotonic
Histogram with fixed size bins (bins=45)
ValueCountFrequency (%)
44500
 
4.9%
49495
 
4.9%
46490
 
4.8%
45486
 
4.8%
47479
 
4.7%
43473
 
4.7%
48472
 
4.7%
50452
 
4.5%
42426
 
4.2%
51398
 
3.9%
Other values (35)5456
53.9%
ValueCountFrequency (%)
2678
0.8%
2732
 
0.3%
2829
 
0.3%
2956
 
0.6%
3070
 
0.7%
3191
0.9%
32106
1.0%
33127
1.3%
34146
1.4%
35184
1.8%
ValueCountFrequency (%)
731
 
< 0.1%
701
 
< 0.1%
682
 
< 0.1%
674
 
< 0.1%
662
 
< 0.1%
65101
1.0%
6443
0.4%
6365
0.6%
6293
0.9%
6193
0.9%

Gender
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size613.3 KiB
F
5358 
M
4769 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters10127
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowM
2nd rowF
3rd rowM
4th rowF
5th rowM
ValueCountFrequency (%)
F5358
52.9%
M4769
47.1%
Histogram of lengths of the category
ValueCountFrequency (%)
f5358
52.9%
m4769
47.1%

Most occurring characters

ValueCountFrequency (%)
F5358
52.9%
M4769
47.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter10127
100.0%

Most frequent character per category

ValueCountFrequency (%)
F5358
52.9%
M4769
47.1%

Most occurring scripts

ValueCountFrequency (%)
Latin10127
100.0%

Most frequent character per script

ValueCountFrequency (%)
F5358
52.9%
M4769
47.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII10127
100.0%

Most frequent character per block

ValueCountFrequency (%)
F5358
52.9%
M4769
47.1%

Dependent_count
Real number (ℝ≥0)

ZEROS

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.346203219
Minimum0
Maximum5
Zeros904
Zeros (%)8.9%
Memory size79.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median2
Q33
95-th percentile4
Maximum5
Range5
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.298908349
Coefficient of variation (CV)0.5536214162
Kurtosis-0.6830166531
Mean2.346203219
Median Absolute Deviation (MAD)1
Skewness-0.02082553562
Sum23760
Variance1.687162899
MonotocityNot monotonic
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
32732
27.0%
22655
26.2%
11838
18.1%
41574
15.5%
0904
 
8.9%
5424
 
4.2%
ValueCountFrequency (%)
0904
 
8.9%
11838
18.1%
22655
26.2%
32732
27.0%
41574
15.5%
5424
 
4.2%
ValueCountFrequency (%)
5424
 
4.2%
41574
15.5%
32732
27.0%
22655
26.2%
11838
18.1%
0904
 
8.9%

Education_Level
Categorical

Distinct7
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size848.8 KiB
Graduate
3128 
High School
2013 
Unknown
1519 
Uneducated
1487 
College
1013 
Other values (2)
967 

Length

Max length13
Median length8
Mean length8.939271255
Min length7

Characters and Unicode

Total characters90528
Distinct characters25
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowHigh School
2nd rowGraduate
3rd rowGraduate
4th rowHigh School
5th rowUneducated
ValueCountFrequency (%)
Graduate3128
30.9%
High School2013
19.9%
Unknown1519
15.0%
Uneducated1487
14.7%
College1013
 
10.0%
Post-Graduate516
 
5.1%
Doctorate451
 
4.5%
Histogram of lengths of the category
ValueCountFrequency (%)
graduate3128
25.8%
school2013
16.6%
high2013
16.6%
unknown1519
12.5%
uneducated1487
12.2%
college1013
 
8.3%
post-graduate516
 
4.3%
doctorate451
 
3.7%

Most occurring characters

ValueCountFrequency (%)
a9226
 
10.2%
e9095
 
10.0%
o7976
 
8.8%
d6618
 
7.3%
t6549
 
7.2%
n6044
 
6.7%
u5131
 
5.7%
r4095
 
4.5%
l4039
 
4.5%
h4026
 
4.4%
Other values (15)27729
30.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter75343
83.2%
Uppercase Letter12656
 
14.0%
Space Separator2013
 
2.2%
Dash Punctuation516
 
0.6%

Most frequent character per category

ValueCountFrequency (%)
a9226
12.2%
e9095
12.1%
o7976
10.6%
d6618
8.8%
t6549
8.7%
n6044
8.0%
u5131
6.8%
r4095
 
5.4%
l4039
 
5.4%
h4026
 
5.3%
Other values (6)12544
16.6%
ValueCountFrequency (%)
G3644
28.8%
U3006
23.8%
H2013
15.9%
S2013
15.9%
C1013
 
8.0%
P516
 
4.1%
D451
 
3.6%
ValueCountFrequency (%)
2013
100.0%
ValueCountFrequency (%)
-516
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin87999
97.2%
Common2529
 
2.8%

Most frequent character per script

ValueCountFrequency (%)
a9226
 
10.5%
e9095
 
10.3%
o7976
 
9.1%
d6618
 
7.5%
t6549
 
7.4%
n6044
 
6.9%
u5131
 
5.8%
r4095
 
4.7%
l4039
 
4.6%
h4026
 
4.6%
Other values (13)25200
28.6%
ValueCountFrequency (%)
2013
79.6%
-516
 
20.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII90528
100.0%

Most frequent character per block

ValueCountFrequency (%)
a9226
 
10.2%
e9095
 
10.0%
o7976
 
8.8%
d6618
 
7.3%
t6549
 
7.2%
n6044
 
6.7%
u5131
 
5.7%
r4095
 
4.5%
l4039
 
4.5%
h4026
 
4.4%
Other values (15)27729
30.6%

Marital_Status
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size781.9 KiB
Married
4687 
Single
3943 
Unknown
749 
Divorced
748 

Length

Max length8
Median length7
Mean length6.684506764
Min length6

Characters and Unicode

Total characters67694
Distinct characters17
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMarried
2nd rowSingle
3rd rowMarried
4th rowUnknown
5th rowMarried
ValueCountFrequency (%)
Married4687
46.3%
Single3943
38.9%
Unknown749
 
7.4%
Divorced748
 
7.4%
Histogram of lengths of the category
ValueCountFrequency (%)
married4687
46.3%
single3943
38.9%
unknown749
 
7.4%
divorced748
 
7.4%

Most occurring characters

ValueCountFrequency (%)
r10122
15.0%
i9378
13.9%
e9378
13.9%
n6190
9.1%
d5435
8.0%
M4687
6.9%
a4687
6.9%
S3943
 
5.8%
g3943
 
5.8%
l3943
 
5.8%
Other values (7)5988
8.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter57567
85.0%
Uppercase Letter10127
 
15.0%

Most frequent character per category

ValueCountFrequency (%)
r10122
17.6%
i9378
16.3%
e9378
16.3%
n6190
10.8%
d5435
9.4%
a4687
8.1%
g3943
 
6.8%
l3943
 
6.8%
o1497
 
2.6%
k749
 
1.3%
Other values (3)2245
 
3.9%
ValueCountFrequency (%)
M4687
46.3%
S3943
38.9%
U749
 
7.4%
D748
 
7.4%

Most occurring scripts

ValueCountFrequency (%)
Latin67694
100.0%

Most frequent character per script

ValueCountFrequency (%)
r10122
15.0%
i9378
13.9%
e9378
13.9%
n6190
9.1%
d5435
8.0%
M4687
6.9%
a4687
6.9%
S3943
 
5.8%
g3943
 
5.8%
l3943
 
5.8%
Other values (7)5988
8.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII67694
100.0%

Most frequent character per block

ValueCountFrequency (%)
r10122
15.0%
i9378
13.9%
e9378
13.9%
n6190
9.1%
d5435
8.0%
M4687
6.9%
a4687
6.9%
S3943
 
5.8%
g3943
 
5.8%
l3943
 
5.8%
Other values (7)5988
8.8%

Income_Category
Categorical

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size810.4 KiB
Less than $40K
3561 
$40K - $60K
1790 
$80K - $120K
1535 
$60K - $80K
1402 
Unknown
1112 

Length

Max length14
Median length12
Mean length11.4801027
Min length7

Characters and Unicode

Total characters116259
Distinct characters22
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row$60K - $80K
2nd rowLess than $40K
3rd row$80K - $120K
4th rowLess than $40K
5th row$60K - $80K
ValueCountFrequency (%)
Less than $40K3561
35.2%
$40K - $60K1790
17.7%
$80K - $120K1535
15.2%
$60K - $80K1402
 
13.8%
Unknown1112
 
11.0%
$120K +727
 
7.2%
Histogram of lengths of the category
ValueCountFrequency (%)
5454
19.9%
40k5351
19.5%
less3561
13.0%
than3561
13.0%
60k3192
11.6%
80k2937
10.7%
120k2262
8.2%
unknown1112
 
4.1%

Most occurring characters

ValueCountFrequency (%)
17303
14.9%
$13742
11.8%
013742
11.8%
K13742
11.8%
s7122
 
6.1%
n6897
 
5.9%
45351
 
4.6%
-4727
 
4.1%
L3561
 
3.1%
e3561
 
3.1%
Other values (12)26511
22.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter31599
27.2%
Decimal Number29746
25.6%
Uppercase Letter18415
15.8%
Space Separator17303
14.9%
Currency Symbol13742
11.8%
Dash Punctuation4727
 
4.1%
Math Symbol727
 
0.6%

Most frequent character per category

ValueCountFrequency (%)
s7122
22.5%
n6897
21.8%
e3561
11.3%
t3561
11.3%
h3561
11.3%
a3561
11.3%
k1112
 
3.5%
o1112
 
3.5%
w1112
 
3.5%
ValueCountFrequency (%)
013742
46.2%
45351
 
18.0%
63192
 
10.7%
82937
 
9.9%
12262
 
7.6%
22262
 
7.6%
ValueCountFrequency (%)
K13742
74.6%
L3561
 
19.3%
U1112
 
6.0%
ValueCountFrequency (%)
$13742
100.0%
ValueCountFrequency (%)
17303
100.0%
ValueCountFrequency (%)
-4727
100.0%
ValueCountFrequency (%)
+727
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common66245
57.0%
Latin50014
43.0%

Most frequent character per script

ValueCountFrequency (%)
K13742
27.5%
s7122
14.2%
n6897
13.8%
L3561
 
7.1%
e3561
 
7.1%
t3561
 
7.1%
h3561
 
7.1%
a3561
 
7.1%
U1112
 
2.2%
k1112
 
2.2%
Other values (2)2224
 
4.4%
ValueCountFrequency (%)
17303
26.1%
$13742
20.7%
013742
20.7%
45351
 
8.1%
-4727
 
7.1%
63192
 
4.8%
82937
 
4.4%
12262
 
3.4%
22262
 
3.4%
+727
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII116259
100.0%

Most frequent character per block

ValueCountFrequency (%)
17303
14.9%
$13742
11.8%
013742
11.8%
K13742
11.8%
s7122
 
6.1%
n6897
 
5.9%
45351
 
4.6%
-4727
 
4.1%
L3561
 
3.1%
e3561
 
3.1%
Other values (12)26511
22.8%

Card_Category
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size705.8 KiB
Blue
9436 
Silver
 
555
Gold
 
116
Platinum
 
20

Length

Max length8
Median length4
Mean length4.117507653
Min length4

Characters and Unicode

Total characters41698
Distinct characters16
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBlue
2nd rowBlue
3rd rowBlue
4th rowBlue
5th rowBlue
ValueCountFrequency (%)
Blue9436
93.2%
Silver555
 
5.5%
Gold116
 
1.1%
Platinum20
 
0.2%
Histogram of lengths of the category
ValueCountFrequency (%)
blue9436
93.2%
silver555
 
5.5%
gold116
 
1.1%
platinum20
 
0.2%

Most occurring characters

ValueCountFrequency (%)
l10127
24.3%
e9991
24.0%
u9456
22.7%
B9436
22.6%
i575
 
1.4%
S555
 
1.3%
v555
 
1.3%
r555
 
1.3%
G116
 
0.3%
o116
 
0.3%
Other values (6)216
 
0.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter31571
75.7%
Uppercase Letter10127
 
24.3%

Most frequent character per category

ValueCountFrequency (%)
l10127
32.1%
e9991
31.6%
u9456
30.0%
i575
 
1.8%
v555
 
1.8%
r555
 
1.8%
o116
 
0.4%
d116
 
0.4%
a20
 
0.1%
t20
 
0.1%
Other values (2)40
 
0.1%
ValueCountFrequency (%)
B9436
93.2%
S555
 
5.5%
G116
 
1.1%
P20
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
Latin41698
100.0%

Most frequent character per script

ValueCountFrequency (%)
l10127
24.3%
e9991
24.0%
u9456
22.7%
B9436
22.6%
i575
 
1.4%
S555
 
1.3%
v555
 
1.3%
r555
 
1.3%
G116
 
0.3%
o116
 
0.3%
Other values (6)216
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII41698
100.0%

Most frequent character per block

ValueCountFrequency (%)
l10127
24.3%
e9991
24.0%
u9456
22.7%
B9436
22.6%
i575
 
1.4%
S555
 
1.3%
v555
 
1.3%
r555
 
1.3%
G116
 
0.3%
o116
 
0.3%
Other values (6)216
 
0.5%

Months_on_book
Real number (ℝ≥0)

Distinct44
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.9284092
Minimum13
Maximum56
Zeros0
Zeros (%)0.0%
Memory size79.2 KiB

Quantile statistics

Minimum13
5-th percentile22
Q131
median36
Q340
95-th percentile50
Maximum56
Range43
Interquartile range (IQR)9

Descriptive statistics

Standard deviation7.986416331
Coefficient of variation (CV)0.2222869453
Kurtosis0.4001001202
Mean35.9284092
Median Absolute Deviation (MAD)4
Skewness-0.1065653599
Sum363847
Variance63.78284581
MonotocityNot monotonic
Histogram with fixed size bins (bins=44)
ValueCountFrequency (%)
362463
24.3%
37358
 
3.5%
34353
 
3.5%
38347
 
3.4%
39341
 
3.4%
40333
 
3.3%
31318
 
3.1%
35317
 
3.1%
33305
 
3.0%
30300
 
3.0%
Other values (34)4692
46.3%
ValueCountFrequency (%)
1370
0.7%
1416
 
0.2%
1534
 
0.3%
1629
 
0.3%
1739
 
0.4%
1858
0.6%
1963
0.6%
2074
0.7%
2183
0.8%
22105
1.0%
ValueCountFrequency (%)
56103
1.0%
5542
 
0.4%
5453
 
0.5%
5378
0.8%
5262
 
0.6%
5180
0.8%
5096
0.9%
49141
1.4%
48162
1.6%
47171
1.7%

Total_Relationship_Count
Real number (ℝ≥0)

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.812580231
Minimum1
Maximum6
Zeros0
Zeros (%)0.0%
Memory size79.2 KiB

Quantile statistics

Minimum1
5-th percentile1
Q13
median4
Q35
95-th percentile6
Maximum6
Range5
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.554407865
Coefficient of variation (CV)0.4077049586
Kurtosis-1.006130507
Mean3.812580231
Median Absolute Deviation (MAD)1
Skewness-0.162452415
Sum38610
Variance2.416183812
MonotocityNot monotonic
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
32305
22.8%
41912
18.9%
51891
18.7%
61866
18.4%
21243
12.3%
1910
 
9.0%
ValueCountFrequency (%)
1910
 
9.0%
21243
12.3%
32305
22.8%
41912
18.9%
51891
18.7%
61866
18.4%
ValueCountFrequency (%)
61866
18.4%
51891
18.7%
41912
18.9%
32305
22.8%
21243
12.3%
1910
 
9.0%

Months_Inactive_12_mon
Real number (ℝ≥0)

Distinct7
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.341167177
Minimum0
Maximum6
Zeros29
Zeros (%)0.3%
Memory size79.2 KiB

Quantile statistics

Minimum0
5-th percentile1
Q12
median2
Q33
95-th percentile4
Maximum6
Range6
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.010622399
Coefficient of variation (CV)0.4316745978
Kurtosis1.098522614
Mean2.341167177
Median Absolute Deviation (MAD)1
Skewness0.633061129
Sum23709
Variance1.021357634
MonotocityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
33846
38.0%
23282
32.4%
12233
22.0%
4435
 
4.3%
5178
 
1.8%
6124
 
1.2%
029
 
0.3%
ValueCountFrequency (%)
029
 
0.3%
12233
22.0%
23282
32.4%
33846
38.0%
4435
 
4.3%
5178
 
1.8%
6124
 
1.2%
ValueCountFrequency (%)
6124
 
1.2%
5178
 
1.8%
4435
 
4.3%
33846
38.0%
23282
32.4%
12233
22.0%
029
 
0.3%

Contacts_Count_12_mon
Real number (ℝ≥0)

ZEROS

Distinct7
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.455317468
Minimum0
Maximum6
Zeros399
Zeros (%)3.9%
Memory size79.2 KiB

Quantile statistics

Minimum0
5-th percentile1
Q12
median2
Q33
95-th percentile4
Maximum6
Range6
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.106225143
Coefficient of variation (CV)0.4505426109
Kurtosis0.0008626566254
Mean2.455317468
Median Absolute Deviation (MAD)1
Skewness0.01100562622
Sum24865
Variance1.223734066
MonotocityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
33380
33.4%
23227
31.9%
11499
14.8%
41392
13.7%
0399
 
3.9%
5176
 
1.7%
654
 
0.5%
ValueCountFrequency (%)
0399
 
3.9%
11499
14.8%
23227
31.9%
33380
33.4%
41392
13.7%
5176
 
1.7%
654
 
0.5%
ValueCountFrequency (%)
654
 
0.5%
5176
 
1.7%
41392
13.7%
33380
33.4%
23227
31.9%
11499
14.8%
0399
 
3.9%

Credit_Limit
Real number (ℝ≥0)

HIGH CORRELATION

Distinct6205
Distinct (%)61.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8631.953698
Minimum1438.3
Maximum34516
Zeros0
Zeros (%)0.0%
Memory size79.2 KiB

Quantile statistics

Minimum1438.3
5-th percentile1438.51
Q12555
median4549
Q311067.5
95-th percentile34516
Maximum34516
Range33077.7
Interquartile range (IQR)8512.5

Descriptive statistics

Standard deviation9088.77665
Coefficient of variation (CV)1.052922313
Kurtosis1.808989336
Mean8631.953698
Median Absolute Deviation (MAD)2593
Skewness1.666725808
Sum87415795.1
Variance82605861
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
34516508
 
5.0%
1438.3507
 
5.0%
1598718
 
0.2%
995918
 
0.2%
2398112
 
0.1%
622411
 
0.1%
249011
 
0.1%
373511
 
0.1%
746910
 
0.1%
20698
 
0.1%
Other values (6195)9013
89.0%
ValueCountFrequency (%)
1438.3507
5.0%
14392
 
< 0.1%
14401
 
< 0.1%
14412
 
< 0.1%
14421
 
< 0.1%
14433
 
< 0.1%
14461
 
< 0.1%
14492
 
< 0.1%
14512
 
< 0.1%
14522
 
< 0.1%
ValueCountFrequency (%)
34516508
5.0%
344961
 
< 0.1%
344581
 
< 0.1%
344271
 
< 0.1%
341981
 
< 0.1%
341731
 
< 0.1%
341621
 
< 0.1%
341401
 
< 0.1%
340581
 
< 0.1%
340101
 
< 0.1%

Total_Revolving_Bal
Real number (ℝ≥0)

ZEROS

Distinct1974
Distinct (%)19.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1162.814061
Minimum0
Maximum2517
Zeros2470
Zeros (%)24.4%
Memory size79.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1359
median1276
Q31784
95-th percentile2517
Maximum2517
Range2517
Interquartile range (IQR)1425

Descriptive statistics

Standard deviation814.9873352
Coefficient of variation (CV)0.7008750257
Kurtosis-1.145991782
Mean1162.814061
Median Absolute Deviation (MAD)591
Skewness-0.1488372503
Sum11775818
Variance664204.3566
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02470
 
24.4%
2517508
 
5.0%
196512
 
0.1%
148012
 
0.1%
172011
 
0.1%
166411
 
0.1%
143411
 
0.1%
154210
 
0.1%
117510
 
0.1%
156010
 
0.1%
Other values (1964)7062
69.7%
ValueCountFrequency (%)
02470
24.4%
1321
 
< 0.1%
1341
 
< 0.1%
1451
 
< 0.1%
1541
 
< 0.1%
1571
 
< 0.1%
1592
 
< 0.1%
1682
 
< 0.1%
1701
 
< 0.1%
1861
 
< 0.1%
ValueCountFrequency (%)
2517508
5.0%
25143
 
< 0.1%
25131
 
< 0.1%
25122
 
< 0.1%
25111
 
< 0.1%
25092
 
< 0.1%
25082
 
< 0.1%
25074
 
< 0.1%
25061
 
< 0.1%
25053
 
< 0.1%

Avg_Open_To_Buy
Real number (ℝ≥0)

HIGH CORRELATION

Distinct6813
Distinct (%)67.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7469.139637
Minimum3
Maximum34516
Zeros0
Zeros (%)0.0%
Memory size79.2 KiB

Quantile statistics

Minimum3
5-th percentile480.3
Q11324.5
median3474
Q39859
95-th percentile32183.4
Maximum34516
Range34513
Interquartile range (IQR)8534.5

Descriptive statistics

Standard deviation9090.685324
Coefficient of variation (CV)1.217099394
Kurtosis1.798617296
Mean7469.139637
Median Absolute Deviation (MAD)2665
Skewness1.661696546
Sum75639977.1
Variance82640559.65
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1438.3324
 
3.2%
3451698
 
1.0%
3199926
 
0.3%
7878
 
0.1%
9537
 
0.1%
7017
 
0.1%
4637
 
0.1%
7137
 
0.1%
7406
 
0.1%
9336
 
0.1%
Other values (6803)9631
95.1%
ValueCountFrequency (%)
31
< 0.1%
101
< 0.1%
142
< 0.1%
151
< 0.1%
241
< 0.1%
281
< 0.1%
291
< 0.1%
361
< 0.1%
392
< 0.1%
412
< 0.1%
ValueCountFrequency (%)
3451698
1.0%
343621
 
< 0.1%
343021
 
< 0.1%
343001
 
< 0.1%
342971
 
< 0.1%
342861
 
< 0.1%
342381
 
< 0.1%
342271
 
< 0.1%
341401
 
< 0.1%
341191
 
< 0.1%

Total_Amt_Chng_Q4_Q1
Real number (ℝ≥0)

Distinct1158
Distinct (%)11.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.7599406537
Minimum0
Maximum3.397
Zeros5
Zeros (%)< 0.1%
Memory size79.2 KiB

Quantile statistics

Minimum0
5-th percentile0.463
Q10.631
median0.736
Q30.859
95-th percentile1.103
Maximum3.397
Range3.397
Interquartile range (IQR)0.228

Descriptive statistics

Standard deviation0.2192067692
Coefficient of variation (CV)0.288452484
Kurtosis9.993501179
Mean0.7599406537
Median Absolute Deviation (MAD)0.114
Skewness1.732063411
Sum7695.919
Variance0.04805160768
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.79136
 
0.4%
0.74334
 
0.3%
0.71234
 
0.3%
0.73533
 
0.3%
0.71833
 
0.3%
0.72232
 
0.3%
0.74432
 
0.3%
0.69932
 
0.3%
0.76731
 
0.3%
0.6931
 
0.3%
Other values (1148)9799
96.8%
ValueCountFrequency (%)
05
< 0.1%
0.011
 
< 0.1%
0.0181
 
< 0.1%
0.0461
 
< 0.1%
0.0612
 
< 0.1%
0.0721
 
< 0.1%
0.1011
 
< 0.1%
0.121
 
< 0.1%
0.1531
 
< 0.1%
0.1631
 
< 0.1%
ValueCountFrequency (%)
3.3971
< 0.1%
3.3551
< 0.1%
2.6751
< 0.1%
2.5941
< 0.1%
2.3681
< 0.1%
2.3571
< 0.1%
2.3161
< 0.1%
2.2821
< 0.1%
2.2751
< 0.1%
2.2711
< 0.1%

Total_Trans_Amt
Real number (ℝ≥0)

Distinct5033
Distinct (%)49.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4404.086304
Minimum510
Maximum18484
Zeros0
Zeros (%)0.0%
Memory size79.2 KiB

Quantile statistics

Minimum510
5-th percentile1283.3
Q12155.5
median3899
Q34741
95-th percentile14212
Maximum18484
Range17974
Interquartile range (IQR)2585.5

Descriptive statistics

Standard deviation3397.129254
Coefficient of variation (CV)0.7713584656
Kurtosis3.894023406
Mean4404.086304
Median Absolute Deviation (MAD)1308
Skewness2.041003403
Sum44600182
Variance11540487.17
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
425311
 
0.1%
450911
 
0.1%
222910
 
0.1%
451810
 
0.1%
48699
 
0.1%
40429
 
0.1%
43139
 
0.1%
42209
 
0.1%
44989
 
0.1%
40379
 
0.1%
Other values (5023)10031
99.1%
ValueCountFrequency (%)
5101
< 0.1%
5301
< 0.1%
5631
< 0.1%
5691
< 0.1%
5941
< 0.1%
5961
< 0.1%
5971
< 0.1%
6021
< 0.1%
6151
< 0.1%
6431
< 0.1%
ValueCountFrequency (%)
184841
< 0.1%
179951
< 0.1%
177441
< 0.1%
176341
< 0.1%
176281
< 0.1%
174981
< 0.1%
174371
< 0.1%
173901
< 0.1%
173501
< 0.1%
172581
< 0.1%

Total_Trans_Ct
Real number (ℝ≥0)

Distinct126
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean64.85869458
Minimum10
Maximum139
Zeros0
Zeros (%)0.0%
Memory size79.2 KiB

Quantile statistics

Minimum10
5-th percentile28
Q145
median67
Q381
95-th percentile105
Maximum139
Range129
Interquartile range (IQR)36

Descriptive statistics

Standard deviation23.47257045
Coefficient of variation (CV)0.3619032206
Kurtosis-0.3671632411
Mean64.85869458
Median Absolute Deviation (MAD)17
Skewness0.1536730685
Sum656824
Variance550.9615635
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
81208
 
2.1%
75203
 
2.0%
71203
 
2.0%
82202
 
2.0%
69202
 
2.0%
76198
 
2.0%
77197
 
1.9%
70193
 
1.9%
78190
 
1.9%
74190
 
1.9%
Other values (116)8141
80.4%
ValueCountFrequency (%)
104
 
< 0.1%
112
 
< 0.1%
124
 
< 0.1%
135
 
< 0.1%
149
 
0.1%
1516
0.2%
1613
0.1%
1713
0.1%
1823
0.2%
1911
0.1%
ValueCountFrequency (%)
1391
 
< 0.1%
1381
 
< 0.1%
1341
 
< 0.1%
1321
 
< 0.1%
1316
0.1%
1305
< 0.1%
1296
0.1%
12810
0.1%
12712
0.1%
12610
0.1%

Total_Ct_Chng_Q4_Q1
Real number (ℝ≥0)

Distinct830
Distinct (%)8.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.7122223758
Minimum0
Maximum3.714
Zeros7
Zeros (%)0.1%
Memory size79.2 KiB

Quantile statistics

Minimum0
5-th percentile0.368
Q10.582
median0.702
Q30.818
95-th percentile1.069
Maximum3.714
Range3.714
Interquartile range (IQR)0.236

Descriptive statistics

Standard deviation0.2380860913
Coefficient of variation (CV)0.3342861716
Kurtosis15.6892929
Mean0.7122223758
Median Absolute Deviation (MAD)0.119
Skewness2.064030568
Sum7212.676
Variance0.05668498689
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.667171
 
1.7%
1166
 
1.6%
0.5161
 
1.6%
0.75156
 
1.5%
0.6113
 
1.1%
0.8101
 
1.0%
0.71492
 
0.9%
0.83385
 
0.8%
0.77869
 
0.7%
0.62563
 
0.6%
Other values (820)8950
88.4%
ValueCountFrequency (%)
07
0.1%
0.0281
 
< 0.1%
0.0291
 
< 0.1%
0.0381
 
< 0.1%
0.0531
 
< 0.1%
0.0592
 
< 0.1%
0.0621
 
< 0.1%
0.0741
 
< 0.1%
0.0773
< 0.1%
0.0913
< 0.1%
ValueCountFrequency (%)
3.7141
 
< 0.1%
3.5711
 
< 0.1%
3.51
 
< 0.1%
3.251
 
< 0.1%
32
< 0.1%
2.8751
 
< 0.1%
2.751
 
< 0.1%
2.5711
 
< 0.1%
2.53
< 0.1%
2.4291
 
< 0.1%

Avg_Utilization_Ratio
Real number (ℝ≥0)

ZEROS

Distinct964
Distinct (%)9.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.2748935519
Minimum0
Maximum0.999
Zeros2470
Zeros (%)24.4%
Memory size79.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10.023
median0.176
Q30.503
95-th percentile0.793
Maximum0.999
Range0.999
Interquartile range (IQR)0.48

Descriptive statistics

Standard deviation0.2756914693
Coefficient of variation (CV)1.002902641
Kurtosis-0.7949719515
Mean0.2748935519
Median Absolute Deviation (MAD)0.176
Skewness0.7180079968
Sum2783.847
Variance0.07600578622
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02470
 
24.4%
0.07344
 
0.4%
0.05733
 
0.3%
0.04832
 
0.3%
0.0630
 
0.3%
0.04529
 
0.3%
0.06129
 
0.3%
0.06928
 
0.3%
0.05928
 
0.3%
0.05327
 
0.3%
Other values (954)7377
72.8%
ValueCountFrequency (%)
02470
24.4%
0.0041
 
< 0.1%
0.0051
 
< 0.1%
0.0063
 
< 0.1%
0.0071
 
< 0.1%
0.0082
 
< 0.1%
0.0091
 
< 0.1%
0.011
 
< 0.1%
0.0111
 
< 0.1%
0.0124
 
< 0.1%
ValueCountFrequency (%)
0.9991
 
< 0.1%
0.9951
 
< 0.1%
0.9941
 
< 0.1%
0.9921
 
< 0.1%
0.991
 
< 0.1%
0.9881
 
< 0.1%
0.9871
 
< 0.1%
0.9851
 
< 0.1%
0.9841
 
< 0.1%
0.9834
< 0.1%

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

CLIENTNUMAttrition_FlagCustomer_AgeGenderDependent_countEducation_LevelMarital_StatusIncome_CategoryCard_CategoryMonths_on_bookTotal_Relationship_CountMonths_Inactive_12_monContacts_Count_12_monCredit_LimitTotal_Revolving_BalAvg_Open_To_BuyTotal_Amt_Chng_Q4_Q1Total_Trans_AmtTotal_Trans_CtTotal_Ct_Chng_Q4_Q1Avg_Utilization_Ratio
0768805383Existing Customer45M3High SchoolMarried$60K - $80KBlue3951312691.077711914.01.3351144421.6250.061
1818770008Existing Customer49F5GraduateSingleLess than $40KBlue446128256.08647392.01.5411291333.7140.105
2713982108Existing Customer51M3GraduateMarried$80K - $120KBlue364103418.003418.02.5941887202.3330.000
3769911858Existing Customer40F4High SchoolUnknownLess than $40KBlue343413313.02517796.01.4051171202.3330.760
4709106358Existing Customer40M3UneducatedMarried$60K - $80KBlue215104716.004716.02.175816282.5000.000
5713061558Existing Customer44M2GraduateMarried$40K - $60KBlue363124010.012472763.01.3761088240.8460.311
6810347208Existing Customer51M4UnknownMarried$120K +Gold4661334516.0226432252.01.9751330310.7220.066
7818906208Existing Customer32M0High SchoolUnknown$60K - $80KSilver2722229081.0139627685.02.2041538360.7140.048
8710930508Existing Customer37M3UneducatedSingle$60K - $80KBlue3652022352.0251719835.03.3551350241.1820.113
9719661558Existing Customer48M2GraduateSingle$80K - $120KBlue3663311656.016779979.01.5241441320.8820.144

Last rows

CLIENTNUMAttrition_FlagCustomer_AgeGenderDependent_countEducation_LevelMarital_StatusIncome_CategoryCard_CategoryMonths_on_bookTotal_Relationship_CountMonths_Inactive_12_monContacts_Count_12_monCredit_LimitTotal_Revolving_BalAvg_Open_To_BuyTotal_Amt_Chng_Q4_Q1Total_Trans_AmtTotal_Trans_CtTotal_Ct_Chng_Q4_Q1Avg_Utilization_Ratio
10117712503408Existing Customer57M2GraduateMarried$80K - $120KBlue4063417925.0190916016.00.712174981110.8200.106
10118713755458Attrited Customer50M1UnknownUnknown$80K - $120KBlue366349959.09529007.00.82510310631.1000.096
10119716893683Attrited Customer55F3UneducatedSingleUnknownBlue4743314657.0251712140.00.1666009530.5140.172
10120710841183Existing Customer54M1High SchoolSingle$60K - $80KBlue3452013940.0210911831.00.660155771140.7540.151
10121713899383Existing Customer56F1GraduateSingleLess than $40KBlue504143688.06063082.00.570145961200.7910.164
10122772366833Existing Customer50M2GraduateSingle$40K - $60KBlue403234003.018512152.00.703154761170.8570.462
10123710638233Attrited Customer41M2UnknownDivorced$40K - $60KBlue254234277.021862091.00.8048764690.6830.511
10124716506083Attrited Customer44F1High SchoolMarriedLess than $40KBlue365345409.005409.00.81910291600.8180.000
10125717406983Attrited Customer30M2GraduateUnknown$40K - $60KBlue364335281.005281.00.5358395620.7220.000
10126714337233Attrited Customer43F2GraduateMarriedLess than $40KSilver2562410388.019618427.00.70310294610.6490.189